Database Systems Comparison

MySQL
PostgreSQL
DynamoDB
MongoDB
Cassandra
Comparison Matrix
Decision Framework
Common Interview Questions

MySQL

What is MySQL?

MySQL is a relational database management system (RDBMS) based on SQL (Structured Query Language). It's one of the most popular open-source databases, known for its reliability, ease of use, and strong community support.

Key Characteristics

ACID Compliance: Ensures data integrity through Atomicity, Consistency, Isolation, Durability
SQL-based: Uses standard SQL for querying and data manipulation
Storage Engines: Supports multiple storage engines (InnoDB, MyISAM, Memory, etc.)
Replication: Master-slave and master-master replication support
Partitioning: Horizontal partitioning capabilities
Indexing: B-tree indexes, full-text indexes, spatial indexes

Capabilities

Transactions: Full ACID transaction support with InnoDB
Joins: Complex JOIN operations across multiple tables
Foreign Keys: Referential integrity constraints
Views: Virtual tables for data abstraction
Stored Procedures: Server-side programming capabilities
Triggers: Event-driven programming

When to Use MySQL

Web Applications: Traditional web apps with structured data
E-commerce: Order management, inventory, user accounts
Content Management: Blogs, CMS systems
Financial Applications: When ACID compliance is critical
Read-Heavy Workloads: With proper read replicas
Small to Medium Scale: Up to millions of records with proper optimization

Pros

Mature and stable
Large community and ecosystem
Excellent documentation
Cost-effective (open source)
Good performance for read-heavy workloads
Wide hosting support

Cons

Limited scalability for very large datasets
JSON support is basic compared to NoSQL
Complex sharding requirements for horizontal scaling
Performance can degrade with complex queries on large datasets

PostgreSQL

What is PostgreSQL?

PostgreSQL is an advanced open-source relational database that emphasizes extensibility and standards compliance. Often called "Postgres," it's known for its robustness, feature richness, and support for both relational and non-relational data.

Key Characteristics

Object-Relational: Supports both relational and object-oriented features
ACID Compliant: Strong consistency guarantees
Extensible: Custom data types, operators, and functions
Standards Compliant: Closely follows SQL standards
Multi-Version Concurrency Control (MVCC): High concurrency without locking
Advanced Data Types: JSON, Arrays, XML, Geographic data

Capabilities

Complex Queries: Advanced SQL features, window functions, CTEs
JSON/JSONB: Native JSON support with indexing
Full-Text Search: Built-in text search capabilities
Geospatial: PostGIS extension for geographic data
Custom Functions: Support for multiple programming languages
Parallel Queries: Parallel execution for large queries
Logical Replication: Selective data replication

When to Use PostgreSQL

Complex Applications: Applications requiring advanced SQL features
Data Analytics: Complex reporting and analytical queries
Geospatial Applications: Location-based services with PostGIS
JSON-Heavy Workloads: When you need both relational and document features
High Concurrency: Applications with many concurrent users
Data Integrity Critical: Financial, healthcare, government systems

Pros

Feature-rich and powerful
Excellent JSON support
Strong consistency and ACID compliance
Highly extensible
Great for complex queries
Active development community

Cons

Steeper learning curve
Higher resource consumption than MySQL
More complex to tune and optimize
Slower for simple read operations compared to MySQL

DynamoDB

What is DynamoDB?

Amazon DynamoDB is a fully managed NoSQL database service provided by AWS. It's designed for applications that need consistent, single-digit millisecond latency at any scale.

Key Characteristics

Managed Service: No server management required
NoSQL: Key-value and document data model
Serverless: Auto-scaling capabilities
Eventually Consistent: Default consistency model (strongly consistent reads available)
Global Tables: Multi-region replication
Single-Digit Latency: Consistent performance at scale

Capabilities

Automatic Scaling: Handles traffic spikes automatically
Global Secondary Indexes (GSI): Query flexibility
Local Secondary Indexes (LSI): Alternative sort keys
DynamoDB Streams: Change data capture
Point-in-Time Recovery: Backup and restore capabilities
Encryption: At rest and in transit
DAX: In-memory caching layer

When to Use DynamoDB

Serverless Applications: AWS Lambda-based architectures
Gaming: Leaderboards, player data, session management
IoT Applications: High-volume sensor data ingestion
Real-time Applications: Chat apps, live streaming
Mobile Applications: User profiles, app data synchronization
High Traffic Web Apps: When you need predictable performance

Pros

Fully managed (no ops overhead)
Seamless scaling
Predictable performance
Integration with AWS ecosystem
Pay-per-use pricing model
Built-in security features

Cons

Vendor lock-in to AWS
Limited query flexibility
No complex joins or transactions across items
Learning curve for data modeling
Can be expensive at scale
No SQL interface

MongoDB

What is MongoDB?

MongoDB is a document-oriented NoSQL database that stores data in flexible, JSON-like documents. It's designed for scalability, performance, and developer productivity.

Key Characteristics

Document-Based: Stores data as BSON documents
Schema Flexible: Dynamic schemas allow easy evolution
Horizontal Scaling: Built-in sharding support
Rich Query Language: Powerful querying capabilities
Aggregation Framework: Data processing and transformation
Replica Sets: High availability and data redundancy

Capabilities

Flexible Schema: No predefined schema required
Rich Queries: Complex queries with filtering, sorting, projections
Indexing: Compound, text, geospatial, and partial indexes
Aggregation Pipeline: Data transformation and analytics
GridFS: Store and retrieve large files
Change Streams: Real-time data change notifications
Transactions: ACID transactions (MongoDB 4.0+)

When to Use MongoDB

Content Management: Blogs, news sites, catalogs
Real-time Analytics: Event logging, user behavior tracking
Product Catalogs: E-commerce with varying product attributes
Content Delivery: Media, gaming, social platforms
Rapid Prototyping: When schema requirements are unclear
Microservices: Service-specific data storage

Pros

Flexible schema design
Developer-friendly JSON-like documents
Excellent horizontal scaling
Rich querying capabilities
Strong community and ecosystem
Good performance for read/write operations

Cons

Memory intensive
No joins between collections (limited)
Eventual consistency in sharded environments
Complex sharding setup and management
Data duplication often required

Cassandra

What is Cassandra?

Apache Cassandra is a distributed NoSQL wide-column database designed for handling large amounts of data across many commodity servers. Originally developed by Facebook and later open-sourced, Cassandra provides high availability with no single point of failure.

Key Characteristics

Wide-Column Store: Data organized in column families (similar to tables)
Masterless Architecture: All nodes are equal, no single point of failure
Distributed: Data automatically distributed across multiple nodes
Eventually Consistent: Tunable consistency levels
Linear Scalability: Performance increases linearly with added nodes
Fault Tolerant: Continues operating even when nodes fail

Capabilities

High Availability: 99.99% uptime with proper configuration
Massive Scalability: Handle petabytes of data across thousands of nodes
Tunable Consistency: Choose consistency level per query
Multi-Data Center: Built-in multi-datacenter replication
Time Series Data: Excellent for time-stamped data
CQL: Cassandra Query Language (SQL-like syntax)
Compaction: Automatic data optimization and cleanup

Architecture Concepts

Keyspace: Top-level namespace (similar to database)
Column Family/Table: Container for rows
Partition Key: Determines data distribution across nodes
Clustering Key: Determines data ordering within partition
Replication Factor: Number of copies of each data piece
Consistency Level: Trade-off between consistency and availability

When to Use Cassandra

Time Series Data: IoT sensors, monitoring, logs, metrics
High Write Throughput: Applications with heavy write loads
Global Distribution: Multi-region applications
Always-On Applications: Systems that cannot tolerate downtime
Large Scale Analytics: Big data processing and analytics
Event Logging: User activity tracking, audit trails
Content Management: Social media, news feeds, catalogs

Pros

Linear scalability (add nodes = more performance)
No single point of failure
Multi-datacenter support out of the box
Excellent write performance
Handles large datasets efficiently
Tunable consistency
Open source with enterprise support

Cons

Limited query flexibility (no joins)
Eventual consistency by default
Complex data modeling requirements
High operational complexity
Memory intensive
No ACID transactions across partitions
Steep learning curve

Data Modeling Principles

Query-First Design: Design tables based on queries, not entities
Denormalization: Duplicate data to avoid joins
Partition Key Design: Ensure even data distribution
Clustering Key: Order data for efficient range queries
Avoid Large Partitions: Keep partitions under 100MB

Comparison Matrix

Feature	MySQL	PostgreSQL	DynamoDB	MongoDB	Cassandra
Type	Relational	Relational	NoSQL (Key-Value)	NoSQL (Document)	NoSQL (Wide-Column)
ACID	✅ Full	✅ Full	❌ Limited	✅ Limited*	❌ No
Scaling	Vertical + Read Replicas	Vertical + Read Replicas	✅ Auto Horizontal	✅ Horizontal	✅ Linear Horizontal
Schema	Fixed	Fixed	Schema-less	Flexible	Wide-Column
Queries	SQL	Advanced SQL	Key-based + GSI	Rich Query Language	CQL (Limited)
Joins	✅ Complex	✅ Complex	❌ No	❌ Limited	❌ No
JSON Support	Basic	✅ Excellent	✅ Native	✅ Native	Limited
Managed Options	Cloud providers	Cloud providers	✅ Fully managed	Atlas (managed)	DataStax Astra
Cost	Low (open source)	Low (open source)	Pay-per-use	Medium	Medium (open source)
Learning Curve	Low	Medium	Medium	Low-Medium	High
Consistency	Strong	Strong	Eventual/Strong	Strong/Eventual	Tunable
Multi-DC	Manual setup	Manual setup	✅ Global Tables	Manual setup	✅ Built-in

*MongoDB supports ACID transactions within replica sets and sharded clusters (4.0+)

Decision Framework

Choose Cassandra when:

Need massive scale (petabytes of data)
High write throughput requirements
Multi-datacenter deployment
Always-on availability critical
Time series data or event logging
Can accept eventual consistency

Choose MySQL when:

Building traditional web applications
Need strong ACID compliance
Team is familiar with SQL
Budget is constrained
Read-heavy workloads with moderate write volume
Data fits well in relational model

Choose PostgreSQL when:

Need advanced SQL features
Working with both relational and JSON data
Require complex analytical queries
Need geographic/spatial data support
High concurrency requirements
Data integrity is paramount

Choose DynamoDB when:

Building on AWS ecosystem
Need predictable performance at any scale
Serverless architecture
Simple access patterns (key-value lookups)
Want fully managed solution
Traffic patterns are unpredictable

Choose MongoDB when:

Rapid application development
Schema requirements change frequently
Need horizontal scaling
Working with semi-structured data
Building content management systems
Microservices architecture

Common Interview Questions

1. "How do you handle scaling in each database?"

MySQL: Vertical scaling, read replicas, sharding (complex) PostgreSQL: Similar to MySQL, plus logical replication DynamoDB: Automatic horizontal scaling, partition key design critical MongoDB: Horizontal sharding, replica sets for availability Cassandra: Linear horizontal scaling, add nodes for more capacity

2. "When would you choose NoSQL over SQL?"

Consider NoSQL when:

Rapid development with changing requirements
Need to scale horizontally
Working with unstructured/semi-structured data
Simple query patterns
Need high availability over consistency

3. "How do you ensure data consistency?"

SQL Databases (MySQL/PostgreSQL): ACID transactions, foreign keys DynamoDB: Eventually consistent by default, strongly consistent reads available MongoDB: ACID transactions within replica sets, eventual consistency in sharded setups Cassandra: Tunable consistency (ONE, QUORUM, ALL), eventual consistency by default

4. "What are the trade-offs between these databases?"

Consistency vs Availability: SQL databases favor consistency, NoSQL often favors availability Flexibility vs Structure: NoSQL offers schema flexibility, SQL provides data structure Scalability vs Complexity: NoSQL scales easier but can be more complex to model Query Power vs Performance: SQL offers complex queries, NoSQL offers better performance for simple operations

5. "How would you design a system that needs both ACID compliance and horizontal scaling?"

Use PostgreSQL with read replicas and careful partitioning
Consider NewSQL databases (Google Spanner, CockroachDB)
Implement application-level sharding with MySQL
Use MongoDB with proper transaction design
Consider microservices with database per service pattern

6. "Compare Cassandra with other NoSQL databases"

Cassandra vs MongoDB:

Cassandra: Better for write-heavy, time series data, linear scaling
MongoDB: Better for complex queries, flexible documents, easier development

Cassandra vs DynamoDB:

Cassandra: Open source, multi-cloud, more control over infrastructure
DynamoDB: Fully managed, simpler operations, tight AWS integration

7. "When would you choose Cassandra for a system design?"

Choose Cassandra when:

Building IoT data ingestion system (millions of sensors)
Time series analytics (metrics, monitoring, logs)
Global social media platform (activity feeds)
Financial trading systems (high write throughput)
Always-on services (99.99% uptime requirement)

Example Architecture:

IoT Sensors → Load Balancer → Cassandra Cluster → Analytics Pipeline
                                  ↓
                              Multi-DC Replication

Table of Contents​

MySQL​

What is MySQL?​

Key Characteristics​

Capabilities​

When to Use MySQL​

Pros​

Cons​

PostgreSQL​

What is PostgreSQL?​

Key Characteristics​

Capabilities​

When to Use PostgreSQL​

Pros​

Cons​

DynamoDB​

What is DynamoDB?​

Key Characteristics​

Capabilities​

When to Use DynamoDB​

Pros​

Cons​

MongoDB​

What is MongoDB?​

Key Characteristics​

Capabilities​

When to Use MongoDB​

Pros​

Cons​

Cassandra​

What is Cassandra?​

Key Characteristics​

Capabilities​

Architecture Concepts​

When to Use Cassandra​

Pros​

Cons​

Data Modeling Principles​

Comparison Matrix​

Decision Framework​

Choose Cassandra when:​

Choose MySQL when:​

Choose PostgreSQL when:​

Choose DynamoDB when:​

Choose MongoDB when:​

Common Interview Questions​

1. "How do you handle scaling in each database?"​

2. "When would you choose NoSQL over SQL?"​

3. "How do you ensure data consistency?"​

4. "What are the trade-offs between these databases?"​

5. "How would you design a system that needs both ACID compliance and horizontal scaling?"​

6. "Compare Cassandra with other NoSQL databases"​

7. "When would you choose Cassandra for a system design?"​

Table of Contents

MySQL

What is MySQL?

Key Characteristics

Capabilities

When to Use MySQL

Pros

Cons

PostgreSQL

What is PostgreSQL?

Key Characteristics

Capabilities

When to Use PostgreSQL

Pros

Cons

DynamoDB

What is DynamoDB?

Key Characteristics

Capabilities

When to Use DynamoDB

Pros

Cons

MongoDB

What is MongoDB?

Key Characteristics

Capabilities

When to Use MongoDB

Pros

Cons

Cassandra

What is Cassandra?

Key Characteristics

Capabilities

Architecture Concepts

When to Use Cassandra

Pros

Cons

Data Modeling Principles

Comparison Matrix

Decision Framework

Choose Cassandra when:

Choose MySQL when:

Choose PostgreSQL when:

Choose DynamoDB when:

Choose MongoDB when:

Common Interview Questions

1. "How do you handle scaling in each database?"

2. "When would you choose NoSQL over SQL?"

3. "How do you ensure data consistency?"

4. "What are the trade-offs between these databases?"

5. "How would you design a system that needs both ACID compliance and horizontal scaling?"

6. "Compare Cassandra with other NoSQL databases"

7. "When would you choose Cassandra for a system design?"